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Abstract 

This  paper  describes  a  fully-automated  real¬ 
time  broadcast  news  video  and  audio  process¬ 
ing  system.  The  system  combines  speech  rec¬ 
ognition,  machine  translation,  and  cross- 
lingual  information  retrieval  components  to 
enable  real-time  alerting  from  live  English 
and  Arabic  news  sources. 


1  Real-time  Video  Alerting 

This  paper  describes  the  Enhanced  Video  Text  and  Au¬ 
dio  Processing  (eViTAP)  system,  which  provides  fully- 
automated  real-time  broadcast  news  video  and  audio 
processing.  The  system  combines  state-of-the-art 
automatic  speech  recognition  and  machine  translation 
components  with  cross-lingual  information  retrieval  in 
order  to  enable  searching  of  multilingual  video  news 
sources  by  a  monolingual  speaker.  In  addition  to  full 
search  capabilities,  the  system  also  enables  real-time 
alerting,  such  that  a  user  can  be  notified  as  soon  as  a 
word,  phrase,  or  topic  of  interest  appears  in  an  English 
or  Arabic  news  broadcast. 

The  key  component  of  the  news  processing  is  the 
Virage  VideoLogger  video  indexer  software  package 
(Virage  2003).  The  VideoLogger  processes  an  incom¬ 
ing  live  satellite  feed,  encodes  the  video  as  a  digital  fde, 
and  manages  the  video  and  audio  processing  compo¬ 
nents.  The  individual  components  integrated  into  the 
VideoLogger  platform  currently  include  the  audio  proc¬ 
essing  and  machine  translation  systems  described  in 
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Section  2,  as  well  as  face  ID,  broadcaster  logo  ID,  and 
scene  change  analysis. 

The  video  and  audio  processing  components  pro¬ 
duce  textual  metadata  that  is  time-stamped  to  enable 
synchronization  with  the  encoded  video  file.  All  data  is 
indexed  and  stored  for  retrieval  by  a  cross-lingual  in¬ 
formation  retrieval  engine.  Figure  1  shows  the  EViTAP 
cross-lingual  search  and  alerting  interface,  with  real  data 
from  the  system.  The  list  of  relevant  video  clips  match¬ 
ing  an  alerting  profile  or  a  user  search  is  shown  on  the 
left,  with  broadcast  source  and  time,  most-frequent 
named  entities,  and  a  relevancy  score.  Note  that  the 
English  query  “bin  laden”  resulted  in  the  display  of 
relevant  stories  in  both  English  and  Arabic.  The  center 
of  the  screen  contains  the  video  playback  window,  with 
clip  navigation  and  keyframe  storyboard.  The  right  of 
the  interface  contains  the  transcripts  from  the  ASR  and 
MT  engines;  video  playback  is  synchronized  with  the 
transcripts  such  that  words  are  highlighted  as  they  are 
spoken  in  the  video. 

2  Real-time  Spoken  Language  Processing 

The  real-time  audio  processing  in  the  eViTAP  system  is 
performed  by  the  BBN  Audioindexer  system,  described 
in  detail  in  (Makhoul  et  al.  2000).  The  Audioindexer 
system  provides  a  wide  range  of  real-time  audio  proc¬ 
essing  components,  including  automatic  speech  recogni¬ 
tion,  speaker  segmentation  and  identification,  topic 
classification,  and  named  entity  detection.  All  audio 
processing  is  carried  out  on  a  high-end  PC  (dual  2.6 
GHz  Xeon  CPU,  2  GB  RAM).  The  real-time  speech 
recognition  system  produces  a  word  error  rate  of 
roughly  20-30%  for  English  and  Arabic  news  sources. 
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Figure  1:  Multilingual  alerting  and  seareh  interfaee,  with  alert  list,  synehronized  video  playbaek,  Arabie  speeeh 
reeognition  output,  Arabie -to-Engbsh  maehine  translation  output. 


The  Arabic  words  produced  by  the  speech 
recognition  system,  including  all  ASR  errors,  are 
processed  by  an  Arabic -to-English  machine  translation 
system  that  also  operates  in  real  time  (on  a  separate 
high-end  PC).  The  eViTAP  system  currently  processes 
Arabic  sources  using  statistical  machine  translation 
systems  from  IBM  (Al-Onaizan  2003)  and  Language 
Weaver  (Benjamin  et  al.  2003). 
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